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EXTRACTING SHAPE INFORMATION CONTAINED IN CELL 

IMAGES 

BACKGROUND OF THE INVENTION 

5 The present invention pertains to image analysis methods used to extract shape 

information from images of cells. 

Cell shape is a recognized indicator of cell type and/or condition. Thus, it is 
possible to identify many cell types and to distinguish them from other cell types 
based on the cells' shapes. In addition, many interesting biological conditions may be 

10 correlated with cell shape. Biological "conditions" of interest to researchers include 
disease states, normal unperturbed states, quiescent states, states induced by 
exogenous biologically-active agents, etc. Valuable insigjht may be gained by 
inducing a biological condition through a genetic manipulation, exposure to a 
particular agent (e.g., a compound, radiation, a field, etc.), deprivation of required 

15 substance, and other perturbations. Such a condition may cause changes in a 
particular cell's shape, and the cell's modified shape may be indicative of that 
particular condition. In this regard, there known correlations between cell shape and 
cell condition. Agents that affect cytoskeleton, adhesion, regulatory signaling 
pathways, cell cycle, cause significant and specific changes in cell morphology. 

20 In drug discovery work, valuable information can be obtained by 

understanding how a potential therapeutic agent affects a cell. This information may 
give some indication of the mechanism of action associated with the compound. As 
present, little or no formal effort has been made to correlate cell condition with cell 
shape. To the extent that cell shape analyses are performed at all, the results are 

25 typically reported in qualitative terms based on observations of cells using various 
microscopy techniques. Given that there are some known correlations between cell 
shape and cell condition, the ability to quickly determine whether a population of cells 
has a particular modified shape could provide a valuable tool in assessing the 
mechanism of action of an uncharacterized compound that has been tested on the 
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population of cells. Therefore, it would be desirable to have improved techniques for 
analysis of cell shapes. 
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SUMMARY OF THE INVENTION 

The present invention addresses this need by providing methods and apparatus 
for the analysis of images of cells and extraction biologicaUy-significant shape-related 
features from the cell images. The extracted features may be correlated with 

5 particular conditions induced by biologically-active agents with which cells have been 
treated, thereby enabling the automated analysis of cells based on cell shape 
parameters. In particular, the invention provides methods for segmentation of cells in 
an image using data from a plurality of separate images of different cell components. 
One application of the invention involves the use of a reference cell component 

10 (preferably one that has been previously segmented and therefore one whose 
segmentation parameters are well understood and may be repeated) in combination 
with image data for a second component to perform a segmentation of another cell 
component or the whole cell. This application of the invention is particularly 
effective when the reference component has been previously segmented and is present 

15 in a single copy in the cell, such as the nucleus, centrosome, specific chromosome, 
Golgi complex etc. The invention further provides techniques for extraction of 
biologically-relevant shape-related cell features from segmented cell images. 

hi accordance with the present invention, image data for a reference cell 
component, preferably present in the cell in a single copy and/or previously 

20 segmented (for example, cell nuclei) are processed together with image data for a cell 
shape-indicative marker ( for example, cytoskeletal components, (e.g., tubulin), one or 
more cytoplasmic proteins (for example lactate dehydrogenase or total cell protein), 
or membrane components (e.g., lipids or plasma membrane receptors)) in a watershed 
technique. Further, the invention provides a skeletonization and skeleton analysis 

25 technique for extracting biologically-relevant shape-related features from the 
segmented images. 

One aspect of this invention pertains to a method of identifying boundaries of 
biological cells. The method involves receiving a first image of a field of one or more 
cells in which a reference cell component of the one or more cells is identified by a 
30 reference cell component marker image parameter, receiving a second image of the 
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field of one or more cells in which at least one of a cell shape-indicative maiker of the 
one or more cells is identified by a cell shape-indicative marker image parameter, and 
processing the first image in conjunction with the second image so that individual cell 
boundaries for the one or more cells in the field are identified. The image processing 
5 may involve segmenting the one or more reference cell components in the first image 
to generate a digital representation of the first image, called a reference cell 
component mask, thresholding the cell shape-indicative marker in the second image to 
generate a digital representation of the second image including a cell shape-indicative 
maiker portion and a background portion, conceptually registering the nuclei mask 

10 with the digital representation of the second image, and applying a watershed 
algorithm to data provided by the conceptually registered reference component mask 
and digital representation of the second image such that individual cell boundaries for 
the one or more cells in the field are identified. In addition, the method may further 
involve extracting biologically-significant shape-related information from the field of 

1 5 one or more cells. 

Another aspect of the invention pertains to a method of extracting 
biologically-significant shape-related information from a field of one or more cells. 
The method involves providing a segmented image of the field of one or more 
segmented cells, where the boundaries of the one or more segmented cells have been 

20 ascertained by the segmentation. For each of one or more of the cells in the 
segmented cell image two endpoints defining two parts of the boundary of the one or 
more cells are selected. For each part of the cell's boundary, computing the distance 
from each point on the part of the cell boundary to a line between the endpoints, 
determining a point <Jmax on the portion of the boundary, the point (Imax being 

25 maximally distant from the line, and comparing the distance from the point (Imax to a 
predetermined threshold distance value, diH- Where (Imax is greater than &m> the line 
between the endpoints is discarded, and point d M AX as a new endpoint together with 
one of the original endpoints to separate the part into two new parts, and the process 
from determining a point (Imax is repeated. Where dj^AX is less than dra> the line is 

30 used as a side of a polygon approximating the cell shape until a polygon 
approximating the shape of the cell is complete. When the polygon is complete, it is 
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skeletonized, and end points and/or nodes are calculated for the polygon 
approximation of the cell. 

Still another aspect of the invention pertains to a method of correlating a cell's 
shape with a biological condition of the cell. The method involves providing a 

5 plurality of segmented images of fields of one or more segmented cells, whore at least 
one of the fields has been treated with a biologically active agent and at least one of 
the fields is a control having not been treated with the biologically active agent The 
boundaries of the one or more segmented cells having been ascertained by the 
segmentation. For each of one or more of the cells in the plurality of segmented cell 

10 images, two endpoints defining two parts of the boundary of the one or more cells are 
selected. For each part of the cell's boundary, computing the distance from each point 
on the part of the cell boundary to a line between the endpoints, determining a point 
cImax on the portion of the boundary, the point (Imax being maximally distant from the 
line, and comparing the distance from the point (Imax to a predetermined threshold 

15 distance value, dm- Where <1max is greater than dm> the line between the endpoints is 
discarded, and point dMAX as a new endpoint together with one of the original 
endpoints to separate the part into two new parts, and the process from determining a 
point (Imax is repeated. Where (Imax is less than d ra the line is used as a side of a 
polygon approximating the cell shape until a polygon approximating the shape of the 

20 cell is complete. When the polygon is complete, it is skeletonized, and end points 
and/or nodes are calculated for the polygon approximation of the cell. The 
computations of end points and/or nodes for the polygon approximation of the cell are 
then computed to identify significant shape differences between the treated and 
control fields of one or more cells. 

25 Yet another aspect of the present invention pertains to an image analysis 

apparatus for identifying individual biological cells in a field of cells. The apparatus 
includes a memory or buffer adapted to store, at least temporarily, a first image of a 
field of one or more cells, in which a reference cell component of the one or more 
cells is identified by a reference cell component marker image parameter, and a 

30 second image of the field of one or more cells, in which at least one of a cell shape- 
indicative marker of the one or more cells are identified by a cell shape-indicative 
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marker image parameter. The apparatus further includes a processor configured or 
designed to process the first image in conjunction with the second image such that 
individual cell boundaries for the one or more cells in the field are identified. 

Another aspect of the invention pertains to a method of identifying boundaries 
5 of biological cells. The method involves receiving a first image of the field of one or 
more cells in which a reference cell component of the one or more cells is identified 
by a reference cell component marker image parameter, and receiving a second image 
of the field of one or more cells in which at least one of a cell shape-indicative marker 
of the one or more cells is identified by a cell shape-indicative marker image 

1 o parameter. Then, a thresholding techniques is applied the cell shape-indicative 
marker image parameter in the second image to generate a digital representation of 
the second image comprising a cell shape-indicative marker portion and a background 
portion, and the boundaries of individual cells are identified by applying a watershed 
algorithm to the second image using the reference cell component marker image 

15 parameter and the background portion of the digital representation of the second 
image as seeds. 

Another aspect of the invention pertains to computer program products 
including a machine readable medium on which is stored program instructions for 
implementing any of the methods described above. Any of the methods of this 
20 invention may be represented as program instructions that can be provided on such 
computer readable media. 

These and other features and advantages of the present invention will be 
described below in more detail with reference to the associated drawings. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Fig.l depicts an image of a field of cells from cell line SF268 that have been 
treated with rodamine-labeled Dmlalpha antibody and imaged so that the extent of 
the cell shape-indicative cytoskeletal protein tubulin in the field of cells is visible. 

5 Fig. 2 is a process flow diagram depicting - at a high level - one process of 

this invention for segmenting an image of cells and analyzing the segmented cell 
according to its shape-related features. 

Fig. 3 is a plot of intensity versus pixel location in an image and showing how 
thresholding may be used to segment an image into individual cells. 

10 pig. 4A is a process flow diagram illustrating an image analysis process in 

accordance with the present invention that produces a preliminary threshold-based 
cell segmentation. 

Fig. 4B is an illustration of calculating a threshold value based on an intensity 
histogram in accordance with the present invention. 

15 Fig. 4C is a binary image, representing a preliminary threshold-based cell 

segmentation in accordance with the present invention showing cells stained with 
antibodies to tubulin overlaid with a rougji periphery of objects identified by the 
thresholding algorithm. 

Fig. 5A visually describes steps of a cell segmentation procedure in 
20 accordance with the present invention. 

Fig. 5B depicts a visual illustration of a watershed algorithm. 

Fig. 5C provides a segmented cell image following application of a watershed 
algorithm in accordance with the present invention, showing ceils stained with 
antibodies to tubulin, overlaid with a periphery of objects identified by the 
25 segmentation algorithm. 
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Fig. 6A provides an illustration of a skeleton and skeleton end points and 
nodes in a cellular context. 

Fig. 6B is a process flow diagram illustrating an image analysis process in 
accordance with the present invention for generating a polygon that approximates die 
5 shape of a cell/object. 

Fig. 6C illustrates an example of how the adjustable polygon cell shape 
approximation technique of the present invention may be applied to simplify and 
render biologically significant the end point and node features extracted from a cell 
following skeletonization. 

10 Fig. 7 illustrates a typical computer system that, when appropriately 

configured or designed, can serve as an image analysis apparatus of this invention. 

Figs. 8 and 9 depict the results of an experiment showing the effectiveness of 
techniques in accordance with the present invention for cell segmentation and the 
extraction of cell features evidencing differences in cell condition for cells treated 
15 with a drug and untreated cells. 
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DETAILED DESCRIPTION OF THE INVENTION 
Introduction 

Generally, this invention relates to image analysis processes and apparatus 
configured for image analysis. It also relates to machine-readable media on which is 
5 provided instructions, data structures, etc. for performing the processes of this 
inventioa In accordance with this invention, images of cells are manipulated and 
analyzed in certain ways to extract relevant cell shape-related features. Using those 
features, the apparatus and processes of this invention, can automatically draw certain 
conclusions about the biology of a cell. 

10 The invention provides methods and apparatus that for the analysis of images 

of cells and extraction biologically-significant shape-related features from the cell 
images. The extracted features may be correlated with particular conditions induced 
by biologically-active agents (e.g., drugs or drug candidates) with which cells have 
been treated, thereby enabling the automated analysis of cells based on cell shape 

15 parameters. In particular, the invention provides methods for segmentation of cells in 
an image using data from a plurality of separate images of different cell components. 
One application of the invention involves the use of a reference cell component 
(preferably one that has been previously segmented and therefore one whose 
segmentation parameters are well understood and may be repeated) in combination 

20 with image data for a second component to perform a segmentation of another cell 
component or the whole cell. This application of the invention is particularly 
effective when the reference component has been previously segmented and is present 
in a single copy in the cell, such as the nucleus, centrosome, specific chromosome, 
Golgi complex etc. The invention further provides techniques for extraction of 

25 biologically-relevant shape-related cell features from segmented cell images. 

In accordance with the present invention, image data for a reference cell 
component, preferably present in the cell in a single copy and/or previously 
segmented (for example, cell nuclei) are processed together with image data for a cell 
shape-indicative marker (for example, cytoskeletal components, (e.g., tubulin), one or 
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more cytoplasmic proteins (for example lactate dehydrogenase or total cell protein), 
or membrane components (e.g., lipids or plasma membrane receptors)) in a watershed 
technique. Further, the invention provides a skeletonization and skeleton analysis 
technique for extracting biologically-relevant shape-related features from the 
5 segmented images. 

The invention will now be described in terms of particular specific 
embodiments as depicted in the drawings. However, as will be apparent to those 
skilled in the art, the present invention may be practiced without the employing some 
of the specific details disclosed herein. Some operations or features may be dispensed 

10 with. And often alternate elements or processes may be substituted. For example, in 
the following description, nuclei are used as the reference cell component, and tubulin 
as the cell shape-indicative component to provide image data useful for whole tell 
segmentation. As noted above, cell components other than nuclei, particularly those 
previously segmented and/or present in only a single copy in a cell, and other cell 

1 5 shape-indicative components may also be used. 

Preparation of the Image 

In accordance with the present invention, images may be obtained of cells that 
have been treated with a chemical agent to render visible (or otherwise detectable in a 
region of the electromagnetic spectrum) a cellular component A common example of 

20 such agents are colored dyes specific for a particular cellular component that is 
indicative of cell shape. Other such agents may include fluorescent, phosphorescent 
or radioactive compounds that bind directly or indirectly (e.g., via antibodies or other 
intermediate binding agents) to a cell component. In accordance with the present 
invention, a plurality of cell components may be treated with different agents and 

25 imaged separately. For example, in one embodiment, cell nuclei, a previously 
segmented component present in only a single copy in a cell, and tubulin, a 
cytoskeletal protein, or other component indicative of cell shape, are treated as 
described further below. 

Generally the images used as the starting point for the methods of this 
30 invention are obtained from cells that have been specially treated and/or imaged under 
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conditions that contrast markers of cellular components of interest (e.g., tubulin for 
the cytoskeleton, and DNA for the nuclei) from other cellular components and the 
background of the image. In a preferred embodiment, the cells are fixed and then 
treated with a material that binds to a marker for the components of interest and 
5 shows up in an image. Preferably, the chosen imaging agent binds indiscriminately 
with the marker, regardless of its location in the cell. The agent should provide a 
strong contrast to other features in a given image. To this end, the agent should be 
luminescent, radioactive, fluorescent, etc. Various stains and fluorescent compounds 
may serve this purpose. 

10 a variety of imaging agents are available depending on the particular marker, 

and agents appropriate for labeling cytoskeletal, cytoplasmic, plasma membrane, 
nuclear, and other discrete cell components are well known in the histology art 
Examples of such compounds include fluorescently labeled antibodies to cytoplasmic 
or cytoskeletal proteins, fluorescent dyes which bind to proteins and/or lipids, labeled 

15 ligands which bind to cell surface receptors, and fluorescent DNA intercalators and 
fluorescently labeled antibodies to DNA or other nuclear component which bind to 
the nuclei. For example, a suitable label for the cytoskeletal protein tubulin is a 
fluorescently labeled monoclonal antibody to tubulin, rodamine-labeled Dml alpha, 
produced from hybdridoma DM1 A reported in the publication Blose et al. Journal of 

20 Cell Biology, V98, 1984, 847-858. Examples of fluorescent DNA intercalators 
include DAPI and Hoechst 33341 available from Molecular Probes, Inc. of Eugene, 
Oregon. The antibodies may be fluorescently labeled either directly or indirectly. 

In a preferred embodiment, cells may be treated with more than one imaging 
agent, each imaging agent specific for a different cellular component of interest. The 
25 components) may then be separately imaged by separately ill umina ting the cells with 
an excitation frequency (channel) for the imaging agent of the marker for the 
component of interest. Thus different images of the same cells focussing on different 
cellular components may be obtained on different channels. 

Various techniques for preparing and imaging appropriately treated cells are 
30 described in U.S. Patent Applications 09/310,879, 09/311,996, and 09/311,890, 
previously incorporated by reference. In the case of cells treated with DAPI or other 
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fluorescent material, a collection of such cells is illuminated with light at an excitation 
frequency. A detector is tuned to collect light at an emission frequency. The 
collected light is used to generate the image and highlights regions of high marker 
(e.g., tubulin or DNA) concentration. 

5 In order to derive biologically meaningful cell shape information from an 

analysis of cell images, it is important to be able to distinguish one cell from another 
by establishing its boundaries and distinguishing it from other cells in the image (a 
process sometimes referred to as "segmentation"). Fig. 1 depicts an image 101 of a 
field of cells from cell line SF268 that have been treated with rodamine-labeled 

0 Dml alpha antibody and imaged at an appropriate wavelength, that is in this case an 
excitation wavelength for the fluorescent rodamine label on the antibody. In this way, 
the extent of the cell shape-indicative cytoskeletal protein tubulin in the field of cells 
is visible. It should be noted that segmentation of cells may be challenging, 
particularly in an image depicting a crowded field of cells such as depicted in image 

5 101 where cells overlap and/or abut one another. In accordance with one embodiment 
of the present invention, segmentation is a precursor to extracting features correlated 
with cell shape that convey useful information about cell shape and condition (a 
process sometimes referred to as 'feature extraction"). 

A high level process flow 201 in accordance with one embodiment of this 
0 invention is depicted in Fig. 2. As shown, the process begins at 202, 204 where one 
or more image analysis tools (typically logic implemented in hardware and/or 
software) obtain images showing a reference cell component (in this example, cell 
nuclei; as noted above, images of other cell components, particularly those previously 
segmented and/or present in a single copy in a cell, may also be used) and one or 
5 more cell shape-indicative markers for one or more cells. Typically, images will be 
taken from an assay plate or other cell support mechanism in which multiple cells are 
growing or stored. For nuclei, the image is taken in a maimer that allows the DNA 
within the nuclei of the cells to be identified within the image. For a cell shape- 
indicative marker, an image analysis tool (typically the same tool as used to obtain the 
0 nuclei image) obtains an image showing a cell shape-indicative marker in a manner 
similar to that described above except that the image is taken using a different 
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wavelength (channel) or microscopy technique associated with the cell shape- 
indicative marker rather than the reference component marker (e.g., DNA for nuclei). 

In each case, the image obtained will represent the imaged marker as i 
corresponding "image parameter," The image parameter will be an intensity value o: 
5 light or radiation shown in the image. Often, the intensity value will be provided on t 
per pixel basis. In addition, the intensity value may be provided at a particuhu 
wavelength or narrow range of wavelengths that correspond to the emission frequency 
of an imaging agent that specifically associates with the imaged marker. 

In the following discussion and the figures of the present application, the 
10 reference cell component and cell shape-indicative marker used to describe anc 
illustrate the principles of the present invention are nuclei and the cytoskeletal protein 
tubulin, respectively. However, as noted above, the invention is not limited to the use 
of nuclei and tubulin as the reference cell component and cell shape-indicativ< 
marker. Instead, it should be understood that cell components other than nuclei 
15 particularly those previously segmented and/or present in only a single copy in a cell 
may also be used, and that tubulin is just one example of an array of markers that maj 
be correlated with and indicative of cell shape. Other such reference cell component 
and cell shape-indicative markers may be used in place of or in conjunction wifl 
nuclei and tubulin according to the principles of the invention described herein. 

20 The relevant images obtained at 202, 204 are captured by an image acquisitioi 

system. In one embodiment, the image acquisition system is directly coupled with th< 
image analysis tool of this invention. Alternatively, the image under consideratioi 
may be provided by a remote system unaffiliated with the image acquisition system 
For example, the images may be acquired by a remote image analysis tool and store< 

25 in a database or other repository until they are ready for use by an image analysis too 
of this invention. 

Sometimes corrections must be made to the measured intensity. This i; 
because the absolute magnitude of intensity can vary from image to image due t< 
changes in the staining and/or image acquisition procedure and/or apparatus 
30 Specific optical aberrations can be introduced by various image collectioi 
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components such as lenses, filters, beam splitters, polarizers, etc. Other sources of 
variability may be introduced by an excitation light source, a broad band light source 
for optical microscopy, a detector's detection characteristics, etc. Even different areas 
of the same image may have different characteristics. For example, some optical 

5 elements do not provide a "flat field" As a result, pixels near the center of the image 
have their intensities exaggerated in comparison to pixels at the edges of the image. 
A correction algorithm may be applied to compensate for this effect Such algorithms 
can be easily developed for particular optical systems and parameter sets employed 
using those imaging systems. One simply needs to know the response of the systems 

0 under a given set of acquisition parameters. 

Reference Cell Comp liant Segm entation 

After the nuclei image has been obtained at 202, the image analysis tool 
segments the image into discrete nuclei representations at 206. The goal of 
segmentation is to convert the image into discrete images/representations for the DNA 

5 of each nucleus to generate a "nuclei mask" to be used in conjunction with cell shape- 
indicative marker image data in a cell segmentation process in accordance wife, the 
present invention. In a preferred embodiment, each representation includes only those 
pixels where the DNA of a single cell nucleus is deemed to be present Since the 
DNA is normally contained almost entirely within the nucleus of non-mitotic 

0 eucaryotic cells, the shape of each representation resulting from segmentation 
represents the boundaries within which a nucleus lies. The nuclei mask is a 
composite of the discrete nuclei representations providing intensity as a function of 
position for each nuclei in the image. 

Individual cell nucleus representations may be extracted from the image by 
5 various image analysis procedures. Preferred approaches include edge find i ng 
routines and thresholding routines. Some edge finding algorithms identify pixels at 
locations where intensity is varying rapidly. For many applications of interest here, 
pixels contained within the edges will have a higher intensity than pixels outside the 
edges. Thresholding algorithms convert all pixels below a particular intensity value 
0 to zero intensity in an image subregion (or the entire image, depending upon the 
specific algorithm). The threshold value is chosen to discriminate between nucleus 
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(UNA) images and background. All pixels with intensity values above threshold in a 
given neighborhood are deemed to belong to a particular cell nucleus. 

The concepts underlying thresholding are well known. The technique is 
exemplified in Fig. 3, which presents a plot 301 of intensity versus pixel location for 
an entire image such as image 101. For simplicity, pixels from a single row of an 
image are considered. A threshold value 303 is chosen to extract those features of the 
image having intensity values deemed to correspond to actual cell nuclei. In this 
example, peaks 305, 307, and 309 all contain collections of pixels having intensity 
values above threshold 303. Therefore, each of these is deemed to be a separate 
nucleus for extraction during segmentation. Because peak 311 lies entirely below 
thresh hold 303, it is not identified as a discrete cell nucleus. 

An appropriate threshold may be calculated by various techniques. In a 
specific embodiment, the threshold value is chosen as the mode (highest value) of a 
contrast histogram In this technique, a contrast is computed for every pixel in the 
image. The contrast may be the intensity difference between a pixel and its 
neighbors. Next, for each intensity value (0-255 in an eight byte image), the average 
contrast is computed. The contrast histogram provides average contrast as a function 
of intensity. The threshold is chosen as the intensity value having the largest contrast. 
See "The Image Processing Handbook," Third Edition, John C. Russ 1999 CRC Press 
LLC IEEE Press, and "A Survey of Thresholding Techniques," P.K. Sahoo, S. Soltani 
and A.K.C. Wong, Computer Vision, Graphics, and linage Processing 41, 233-260 
(1988), both of which are incorporated herein by reference for all purposes. In a 
specific embodiment, edge detection may involve convolving images with the 
Laplacian of a Guassian filter. The zero-crossings are detected as edge points. The 
edge points are linked to form closed contours, thereby segmenting the relevant image 
objects. See The Image Processing Handbook, referenced above. Further details 
regarding the segmentation of nuclei in accordance with the present invention and 
associated apparatus and techniques are described in co-pending patent applications 

Nos. 09/729,754 and [Attorney Client Docket No. CYTOP012] by 

Vaisberg et al., filed concurrently herewith and titled IMAGE ANALYSIS OF 
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GOLGI COMPLEX, the disclosures of which have been previously incorporated by 
reference herein. 

Cell Shape-Indicative Marker Thresholding 

After the tubulin image has been obtained at 206, the image analysis tool 
5 invokes a thresholding algorithm to convert the image to a binary representation of 
the image. See 208. As noted above, the concepts underlying thresholding in general 
are well known. However, in a preferred embodiment of the present invention, a 
particular technique for calculating the tubulin (or other cell shape-indicative marker) 
threshold is used, hi this case, the threshold value is chosen to discriminate between 
10 tubulin and background. AH pixels with intensity values above a particular 
(threshold) intensity value zero in an image subregion (or the entire image, depending 
upon the specific algorithm) are deemed to represent tubulin and are set to a non-zero 
value. All pixels the threshold value are set to zero and are deemed to represent 
background. Of course, the values assigned the marker (tubulin) and background are 
15 relative and may be reversed so that the marker is non-zero and the background zero. 
A cell shape-indicative marker thresholding algorithm in accordance with the 
invention is illustrated in the process flow 401 of Fig. 4A 

The intensity is then analyzed for every pixel in the image. See 402 A 
histogram of the number of pixels having a given range of intensities over the range 
20 of intensities analyzed is computed. The histogram may be visualized as graphically 
depicted in Fig. 4B. See 404. The threshold is chosen according to die following 
function: 

Ith - Imax + c * Isro 

where Ith is the threshold, Imax is the intensity of the greatest number of pixels 
25 (assumed to be the background), c is a constant selected on empirical evidence 
suggesting its suitability for the purpose,, and Istd is computed as follows: A 
symmetrical curve 424 of the left part of the histogram 422 up to the vertical line 
denoting Imax in Fig 4B, is computed for the right side of the Imax line- The 
combination of the two portions of the curve 422, 424 is fitted by normal distribution. 
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Istd is the standard deviation of this normal distribution.. See 406. In a preferred 
embodiment of the present invention, c is between about 0.7 and 0.9, most preferably, 
0.8. Once the threshold value, I™ has been calculated, the image analysis tool 
makes a determination on a pixel-by-pixel basis of the image to convert the image to a 

5 binary digital representation of the image, with the binary elements designating, on 
the one hand, tubulin, and on the other hand, background (not tubulin). This binary 
image, representing a preliminary threshold-based cell segmentation, is depicted in 
Fig. 4C showing cells stained with antibodies to tubulin overlaid with a rou^i 
periphery of objects identified by the thresholding algorithm. As described further 

0 below, the resulting digital representation is further processed in conjunction with the 
nuclei mask, discussed above, to achieve an enhanced final segmentation of the cells 
in the original image. 

Application of Watershed Algorithm 

Referring now to Fig. 5A, the digital images resulting from the nuclei 
5 segmentation, the nuclei mask 502, and the result of tubulin image thresholding, the 
binary tubulin image 506, are depicted. Each of the digital images has been obtained 
from a common original field of cells by an image acquisition tool operating in a 
different channel to obtain each image, as described more fully above. The nuclei 
mask 502 depicts the nuclei 504 of the cells in the original image. The binary tubulin 
:0 image 506 depicts an outline of the regions of the original image where tubulin is 
present 508. The area not contained within the tubulin outline is considered 
background 510. 

These digital images are now processed in conjunction with each other using a 
watershed technique in order to achieve segmentation of the cells in the original 

►5 image. See 210. The concepts underlying watershed algorithms are well known The 
technique is illustrated by way of a geographic analogy in Fig. 5B, which presents a 
cross-section of a topology 525. The topology has peaks and valleys of various 
magnitudes, and includes two particularly deep valleys 526 and 528 containing the 
low-points of the topology, and a particularly high peak 530 between the valleys 526, 

;o 528. The high peak 530 represents the point at which the two valleys 526, 528 
ultimately meet, by way of analogy, the point at which bodies of water rising from 
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springs 532 and 534 (referred to as "seeds" in watershed terminology) at the base of 
the valleys 526, 528 would meet, and thus represents the ultimate boundary of the two 
valleys 526, 528. top of the high peak is referred to as a '^watershed." 

The parameters required for application of a watershed algorithm are an image 

5 and seeds. According to the present invention, a watershed algorithm is applied to 
the digital image data contained in the nuclei mask and the digital tubulin (or other 
cell-shape indicative marker) image in order to elucidate cell boundaries for 
segmentation of the cells. Referring again to Fig. 5 A, nuclei mask 502 and binary 
tubulin image 506(in particular here, the background of the original tubulin image 

0 510) are conceptually registered to form a composite digital image 512. By 
conceptually registered it is meant that the spatially corresponding pixels of each of 
the images are considered together. This composite digital image 512 contains all of 
the necessary seeds to apply a watershed algorithm on the original tubulin image 204. 
In this application of watershed, the original tubulin image (101 in Fig. 1) provides a 

5 "container component", and the seeds are the nuclei of the nuclei mask and the 
background of the digital tubulin image. Given these parameters, one of skill in the 
art can apply known watershed algorithms to the cell image data in order to elucidate 
watersheds (cell boundaries) and thereby achieve segmentation of the cells in the 
image. Image 514 (Fig. 5 A) represents the segmented image following watershed in 

0 which individual cells 516 and, importantly, their shapes, may be seen. 

Appropriate watershed algorithms suitable for use in accordance with the 
present invention are described in detail in L . Vincent and P. Soille, Watersheds in 
digital spaces: an efficient algorithm based on immersion simulations, IEEE 
Transactions on Patter Analysis and Machine Intelligence, 13:583-589, 1991, 
5 incorporated by reference herein for all purposes. 

Fig. 5C provides a segmented cell image 551 of image 101 (Fig. 1) following 
application of a watershed algorithm in accordance with the present invention. It 
should be noted that the boundaries of the cells, and hence the cells' shapes are clearly 
delineated. The segmented cell image is now well-suited for extraction of shape- 
0 based cell features. 
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Feature Extraction 



10 



15 



At some point, an image analysis process must obtain image parameters 
relevant to a biological condition of interest. Typically, the parameters of interest 
relate to the size, shape, contour, and/or intensity of the cell images. Examples of 
some specific parameters for analyzing cell shape include the following: 



Total Intensity 
Average Intensity 
Area 

Axes Ratio 

Eccentricity 

Solidity 

Extent 



box to 



X_coord 
Y coord 



sum of pixel intensities in an object 

average intensities in an object 

number of pixels in an object 

ratio of lengths of axes of a fitted ellipse 

distance from the center of an ellipse to its focus 

measure of pixels inside versus pixels outside an object 
surrounded by a simple shape 

the area of the object divided by area of the smallest 
contain the object. 

the X coordinate of an object's centroid 

the Y coordinate of object's centroid 



Form Factor 



characteristic of the shape of the outline of an object 



Diameter the equivalent diameter of an object, that is the diameter 

of the circle with the same area as the object 

Moment characteristic of the shape of an outline of an object, 

also taking into account the distribution of pixels inside 
the object 
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Image analysis routines for extracting these various parameters and others can 
be designed using well known principles. See The Image Processing Handbook, 
referenced above. In addition, various commercially available tools provide suitable 
extraction routines. Examples of some of these products include the MetaMorph 
5 Imaging System, provided by Universal Imaging Corporation, a company with 
headquarters in West Chester, PA and NIH Image, provided by Scion Corporation, a 
company with headquarters in Frederick, Maryland 

The present invention also provides for the extraction of some novel features 
associated with cell shape and thus may be used to correlate cell shape information 

10 with particular cell conditions, for example a condition resulting from the treatment of 
cells with a putative toxic or therapeutic agent See 212. The particular features of 
interest are skeletal end points and nodes of object masks. Fig. 6A provides an 
illustration of a skeleton and skeleton end points and nodes in the cellular context 
The skeleton may be defined by end point 602 and node 604 parameters. An end 

15 point 602 is a point at the terminus of a skeleton branch in the cell. Nodes occur at 
points of intersection of the skeleton represented as lines or curves connecting end 
points. 

Skeletonization techniques, and techniques for the computation of end points 
and nodes, from an object's skeleton are well known. See, for example, J. C. Russ, 

20 The Image Processing Handbook, CRC press, 1998, previously incorporated by . 
reference herein. However, these techniques have not been used to extract features 
from biological cells because standard skeletonization algorithms have been found to 
produce poor results biological context In particular, standard algorithms tend to be 
more sensitive to branching patterns common along cell membranes that is desirable 

25 to produce biologically significant results. 

The present invention provides techniques for extracting biologically 
significant end point and node parameters from images of cells. This is achieved by 
essentially simplifying the cell shape to a point where the end point and node 
branching pattern obtained from analysis of the cell image by application of known 
30 skeletonization techniques is reduced to a basic level that characterizes the cell in a 
biologically significant manner. Preferably, this requires that the cell shape be 
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simplified to eliminate **noise" without significant loss of cell shape features that 
convey information characteristic of the cell condition. 

In a preferred embodiment, the invention accomplishes this by a process of 
adjustable polygon cell shape approximation. An algorithm illustrating a preferred 

5 embodiment of this process is depicted in the process flow 610 of Fig. 6B. The 
method of adjustable polygon cell shape approximation begins with an image of a cell 
whose boundary has been ascertained, typically a segmented cell image from an 
image depicting multiple cells, such as obtained by the techniques described herein 
above (612). A pair of points are selected along the boundary of the cell. These 

10 points are designated as endpoints (614). These two end points separate the closed 
cell boundary into two parts. The same procedure is followed iteratively for each of 
the two parts. For each part, the distance from each point to the line between the 
endpoints is computed, and a point (Imax on the portion of the cell boundary which is 
maximally distant from the line, is determined (616). 

15 The distance from the point dMAX to a predetermined threshold distance value, 

dm, is then compared (618). The threshold distance value, dm is computed 
according the following function: 

dm = c * A 

where A is the area of the cell and c is a constant selected on empirical evidence 
20 suggesting its suitability for the purpose. In a preferred embodiment of the present 
invention, c is between about 1/50 and 1/100, most preferably, 1/80. 

If dMAX is greater than d TO , the line between the endpoints is discarded, point 
(Imax is used as a new endpoint together with one of the original endpoints, this part 
of boundary is separated into two new parts and the process for determining (Imax is 
25 repeated iteratively for each of the two new parts (620). When dMAX is less than d TO , 
the line between the endpoints is used as a side of a polygon approximating the cell 
shape (622). The process is repeated until no dMAX is greater than d™ and all the 
dMAX points selected in this iterative approach and their link lines complete a polygon 
approximating the shape of the cell (624, 626). 
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Upon completion of the polygon approximating the shape of the cell, known 
skeletonization techniques may be applied to the polygon to generate its skeleton and 
extract biologically significant end point and node features for the particular cell 
shape (628). Fig. 6C illustrates an example of how the adjustable polygon cell shape 
5 approximation technique of the present invention may be applied to simplify and 
render biologically significant the end point and node features extracted from a cell 
following skeletonization. A cell 630 is shown with a complex shape that results in 
the generation of a complex biologically insignificant skeleton composed of end 
points and nodes 632 according to conventional skeletonization/branch counting 
10 techniques. Below, a polygon 640 approximating the shape of the cell generates a 
much simpler skeleton 642 with biologically significant end points and nodes. Thus, 
by permitting the identification and counting of biologically significant end points 
and/or nodes, the technique provides a way to quantify differences in the condition of 
cells based on cell shape parameters. 

15 Software/Hardware 

Generally, embodiments of the present invention employ various processes 
involving data stored in or transferred through one or more computer systems. 
Embodiments of the present invention also relate to an apparatus for performing these 
operations. This apparatus may be specially constructed for the required purposes, or 

20 it may be a general-purpose computer selectively activated or reconfigured by a 
computer program and/or data structure stored in the computer. The processes 
presented herein are not inherently related to any particular computer or other 
apparatus. In particular, various general-purpose machines may be used with 
programs written in accordance with the teachings herein, or it may be more 

25 convenient to construct a more specialized apparatus to perform the required method 
steps. A particular structure for a variety of these machines will appear from the 
description given below. 

In addition, embodiments of the present invention relate to computer readable 
media or computer program products that include program instructions and/or data 
30 (including data structures) for performing various computer-implemented operations. 
Examples of computer-readable media include, but are not limited to, magnetic media 
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such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM 
disks; magneto-optical media; semiconductor memory devices, and hardware devices 
that are specially configured to store and perform program instructions, such as read- 
only memory devices (ROM) and random access memory (RAM). The data and 
5 program instructions of this invention may also be embodied on a carrier wave or 
other transport medium* Examples of program instructions include both machine 
code, such as produced by a compiler, and files containing higher level code that may 
be executed by the computer using an interpreter. 

Fig. 7 illustrates a typical computer system that, when appropriately 

10 configured or designed, can serve as an image analysis apparatus of this invention. 
The computer system 700 includes any number of processors 702 (also referred to as 
central processing units, or CPUs) that are coupled to storage devices including 
primary storage 706 (typically a random access memory, or RAM), primary storage 
704 (typically a read only memory, or ROM). CPU 702 may be of various types 

15 including microcontrollers and microprocessors such as programmable devices (e.g., 
CPLDs and FPGAs) and unprogrammable devices such as gate array ASICs or 
general purpose microprocessors. As is well known in the art, primary storage 704 
acts to transfer data and instructions uni-directionally to the CPU and primary storage 
706 is used typically to transfer data and instructions in a bi-directional manner. Both 

20 of these primary storage devices may include any suitable computer-readable media 
such as those described above. A mass storage device 708 is also coupled bi- 
directionally to CPU 702 and provides additional data storage capacity and may 
include any of the computer-readable media described above. Mass storage device 
708 may be used to store programs, data and the like and is typically a secondary 

25 storage medium such as a hard disk. It will be appreciated that the information 
retained within the mass storage device 708, may, in appropriate cases, be 
incorporated in standard fashion as part of primary storage 706 as virtual memory. A 
specific mass storage device such as a CD-ROM 714 may also pass data uni- 
directionally to the CPU. 

30 CPU 702 is also coupled to an interface 710 that connects to one or more 

input/output devices such as such as video monitors, track balls, mice, keyboards, 



23 



WO 02/067195 



PCT/US02/05728 



microphones, touch-sensitive displays, transducer card readers, magnetic or paper 
tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known 
input devices such as, of course, other computers. Finally, CPU 702 optionally may 
be coupled to an external device such as a database or a computer or 
5 teleconmiunications network using an external connection as shown generally at 712. 
With such a connection, it is contemplated that the CPU might receive information 
from the network, or might output information to the network in the course of 
performing the method steps described herein. 

In one embodiment, the computer system 700 is directly coupled to an image 

10 acquisition system such as an optical imaging system that captures images of cells. 
Digital images from the image generating system are provided via interface 712 for 
image analysis by system 700. Alternatively, the images processed by system 700 are 
provided from an image storage source such as a database or other repository of cell 
images. Again, the images are provided via interface 712. Once in the image analysis 

15 apparatus 700, a memory device such as primary storage 706 or mass storage 708 
buffers or stores, at least temporarily, digital images of the cell. Typically, the cell 
images will show locations where a cell component marker of interest (e.g., DNA, 
tubulin, etc.) exists within the cells. In these images, local values of a marker image 
parameter (e.g., radiation intensity) correspond to amounts of the marker at the 

20 locations within the cell shown on the image. With this data, the image analysis 
apparatus 700 can perform various image analysis operations such as distinguishing 
between individual cells in an image of multiple cells and deciphering cell shape. To 
this end, the processor may perform various operations on the stored digital image. 
For example, it may obtain and process multiple images of the same original image 

25 using to different channels (e.g., excitation/detection wavelengths) to analyze the 
image in manner that extracts values of one or more cell shape-indicative parameters 
that correspond to a cellular condition. 

Example 

The following example provides the results of an experiment showing the 
30 effectiveness of techniques in accordance with the present invention for cell 
segmentation and the extraction of cell features evidencing differences in cell 
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condition. It should be understood the following is representative only, and that 
the invention is not limited by the detail set forth in this example. 

Two fields of HUVEC cells were stained for nuclei with DAPI and for tubulin 
with rodamine-labeled Dmlalpha antibody, as described above. One of the fields was 
5 treated with the anti-actin drug Cytochalasin D in a solution in DMSO. The second 
field was untreated with any drug, and had only DMSO applied to act as a control. 
Each field was imaged for nuclei and tubulin and the image data obtained was 
processed using the thresholding and watershed techniques described above to 
segment the HUVEC cells in each field. End point features were then extracted from 
the segmented HUVEC cells in each field using the processes of adjustable polygon 
cell shape approximation, skeletonization and feature extraction described above. 

The results are depicted in Figs. 8 and 9. The numbers on top of the cells 
represent the number of detected end points of each cell's skeleton. Fig. 8 depicts the 
* image of the field cells treated with anti-actin Cytochalasin D. Fig. 9 depicts the 
image of the control field of cells to which no drug had been applied While virtually 
all of the cells in the control field (Fig. 9) show only 2 end points, a significantly 
larger proportion of the cells in the drug treated field show an increase in the number 
of end points (3, 4, 5 and even 6) which reflects an increase in the number of 
branched (or "arborized" cells). This increase in the number of endpoints in treated 
i relative to the untreated (control) cells may be correlated with the more highly 
arborized cell condition (phenotype) resulting from treatment of HUVEC cells with 
anti-actin Cytochalasin D. 

Conclusion 

Although the foregoing invention has been described in some detail for 
; purposes of clarity of understanding, those skilled in the art will appreciate that 
various adaptations and modifications of the just-described preferred embodiments 
can be configured without departing from the scope and spirit of the invention. For 
example, a variety of reference components and cell shape-indicative markers other 
than nuclei and tubulin may be used, and the cell segmentation and feature extraction 
) techniques described may be used alone or in combination. Therefore, the described 
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embodiments should be taken as illustrative and not restrictive, and the invention 
should not be limited to the details given herein but should be defined by the 
following claims and their full scope of equivalents. 



26 



WO 02/067195 



PCT/US02/05728 



what is claimed is: 

CLAIMS 

1 . A method of identifying boundaries of biological cells, the method 
comprising: 

5 receiving a first image of a field of one or more cells in which a reference cell 

component of the one or more cells is identified by a reference cell component marker 
image parameter; 

receiving a second image of the field of one or more cells in which at least one 
of a cell shape-indicative marker of the one or more cells is identified by a cell shape-' 
10 indicative marker image parameter; and 

processing the first image in conjunction with the second image such that 
individual cell boundaries for the one or more cells in the field are identified. 

2. The method of claim 1 , wherein the images of the field of one or more cells 
cell are digital representations of the field of one or more cells. 

15 3. The method of claim 1 , wherein the reference cell component is a present in 
only a single copy in a cell. 

4. The method of claim 3, wherein the reference cell component is selected form 
the group consisting of nucleus, centrosome, a specific chromosome and Golgi 
complex. 

20 5. The method of claim 1, wherein the reference cell component is a previously 
segmented cell component. 

6. The method of claim 5, wherein the reference cell component is a present in 
only a single copy in a cell. 

7. The method of claim 6, wherein the reference cell component is nucleus. 
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8. The method of claim 1 , wherein the cell shape-indicative marker is at least one 
of a cytoskeletal, a cytoplasmic, and a plasma membrane marker. 

9. The method of claim 8, wherein the cytoskeletal marker is tubulin. 

1 0. Hie method of claim 8, wherein the cytoplasmic marker is a cytoplasmic 
5 protein. 

1 1 . The method of claim 8, wherein the plasma membrane marker is at least one 
of lipids and plasma membrane receptors. 

12. The method of claim 1, wherein said image processing comprises: 

segmentation of the reference cell component in the first image to generate a 
digital representation of the first image (reference cell component mask); 

thresholding the cell shape-indicative marker in the second image to generate 
a digital representation of the second image comprising a cell shape-indicative marker 
portion and a background portion; 

conceptually registering the reference cell component mask with the digital 
representation of the second image; and 

applying a watershed algorithm to data provided by the registered reference 
component mask and digital representation of the second image to segment the one or 
more cells in the field. 

13. The method of claim 12, wherein said reference cell component is nucleus and 
20 said cell shape-indicative marker is tubulin. 

14. The method of claim 12, wherein said segmentation of the reference cell 
component comprises: 

converting the first image to a digital representation of the first image, 
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wherein a pixel having an image parameter intensity greater than a 
threshold intensity, Im is recognized as a reference cell component marker 
and is assigned one of 0 and non-zero, and 

wherein a pixel having an image parameter intensity less than the 
5 threshold intensity, Im is not recognized as the reference cell component 

marker and is assigned the other of 0 and non-zero; 

wherein said threshold is chosen as the mode of a contrast histogram of 
the first image. 

15. The method of claim 14, wherein said reference component is nucleus and 
10 said reference component marker is DNA. 

16. The method of claim 12, wherein said thresholding of the cell shape-indicative 
marker comprises: 



converting the second image to a digital representation of the second image, 



15 



wherein a pixel having an image parameter intensity greater than a 
threshold intensity, Ira, is recognized as the cell shape-indicative marker and 
is assigned one of 0 and non-zero, and 



wherein a pixel having an image parameter intensity less than the 
threshold intensity, Ijh, is recognized as background and is assigned the other 
of 0 and non-zero; 



20 



wherein said threshold is calculated according to a method comprising, 



generating a histogram of number of pixels versus image parameter 
intensity, 



assigning an intensity of the greatest number of pixels, Imax, as 
background intensity, 



25 



determining a standard deviation of a normal distribution of the 



background intensity, Istd, and 
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assigning a value to Ith = Imax + c * Istd- 

17. The method of claim 16, wherein 0.7 > c> 0.9. 

18. The method of claim 17, wherein c = 0.8. 

19. The method of claim 12, wherein said application said watershed algorithm to 
5 the data provided by the conceptually registered reference cell component mask and 

digital representation of the second image uses the original cell image and two 
different types of seeds. 

20. The method of claim 19, wherein the seeds are the reference cell component 
portion of the reference component mask and the background portion of the digital 

10 representation of the second image. 

21 . The method of claim 20, wherein said reference cell component is nucleus. 

22. A method of identifying boundaries of biological cells, the method 
comprising: 

receiving a first image of a field of one or more cells in which a reference cell 
15 component of the one or more cells is identified by a reference cell component marker 
image parameter, 

receiving a second image of the field of one or more cells in which at least one 
of a cell shape-indicative marker of the one or more cells is identified by a cell shape- 
indicative marker image parameter, 

20 segmenting the one or more reference cell components in the first image to 

generate a digital representation of the first image (reference cell component mask); 

thresholding the cell shape-indicative marker in the second image to generate 
a digital representation of the second image comprising a cell shape-indicative marker 
portion and a background portion; 

25 conceptually registering the nuclei mask with the digital representation of the 

second image; and 
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applying a watershed algorithm to data provided by the registered reference 
component mask and digital representation of the second image such that individual 
cell boundaries for the one or more cells in the field are identified. 

23. The method of claim 22, wherein said reference cell component is nucleus and 
5 said cell shape-indicative marker is tubulin. 

24. The method of claim 22, further comprising, extracting biologically- 
significant features from the one or more segmented cells. 

25. The method of claim 24, wherein said biologically-significant features 
comprise cell skeletal nodes and end points. 

10 26. The method of claim 25, wherein said nodes and end points are extracted 
following a skeletonization of the one or more segmented cells. 

27. The method of claim 26, wherein said skeletonization comprises conducting 
an adjustable polygon shape approximation of the one or more segmented cells. 

28. The method of claim 27, wherein said adjustable polygon cell shape 
15 approximation comprises: 

(a) providing an image of the one or more segmented cells, the boundary of said one 
or more segmented cells having been ascertained by the segmentation; 

for each of one or more of the cells in the segmented cell image, 

(b) selecting two endpoints defining two parts of the boundary of at least one of said 
20 one or more cells; 

(c) for each part of said cell boundary, computing the distance from each point on the 
part of the cell boundary to a line between said endpoints; 

(d) determining a point dMAx on the portion of the boundary, said point d^AX being 
maximally distant from the line; 



31 



WO 02/067195 



PCT7US02/05728 



(e) comparing the distance from the point <1max to a predetermined threshold distance 
value, dm; 

(f) where dMAX is greater than dm, 

discarding the line between the endpoints, 

5 using point dMAX as a new endpoint together with one of the original endpoints to 
separate the part into two new parts, and 

repeating (c) and following; and 

(g) where dMAX is less than dm, using the line as a side of a polygon approximating 
the cell shape until a polygon approximating the shape of the cell is complete. 

10 29. The method of claim 28, wherein d ra = c * A, where A is the area of the cell 
and c is a constant. 

30. The method of claim 29, wherein 1/50 > c> 1/100. 

3 1 . The method of claim 30, wherein c = 1/80. 

32. A method of extracting biologically-significant shape-related information 
15 from a field of one or more cells, comprising: 

(a) providing a segmented image of the field of one or more segmented cells, the 
boundaries of said one or more segmented cells having been ascertained by the 
segmentation; 

for each of one or more of the cells in the segmented cell image, 

20 (b) selecting two endpoints defining two parts of the boundary of at least one of said 
one or more cells; 

(c) for each part of said cell boundary, computing the distance from each point on the 
part of the cell boundary to a line between said endpoints; 
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(d) determining a point dMAX on the portion of the boundary, said point d^AX being 
maximally distant from the line; 

(e) comparing the distance from the point <Imax to a predetermined threshold distance 
value, d-m; 

5 (f) where dMAX is greater than dm 

discarding the line between the endpoints, 

using point cWx as a new endpoint together with one of the original endpoints to 
separate the part into two new parts, and 

repeating (c) and following; 

10 (g) where dmx is less than dm using the line as a side of a polygon approximating 
the cell shape until a polygon approximating the shape of the cell is complete; and 

(h) skeletonizing and computing at least one of end points and nodes for the polygon 
approximation of the cell. 

33. The method of claim 22, wherein d ra = c* A, where A is the area of the cell 
15 and c is a constant. 

34. The method of claim 23, wherein 1/50 > c> 1/100. 

35. The method of claim 24, wherein c = 1/80. 

36. The method of claim 1 , wherein die one or more cells are treated with a first 
imaging agent that selectively associates with the reference cell component marker 

20 and emits a signal recorded as the reference cell component marker image parameter, 
and said one or more cells are treated with a second imaging agent that selectively 
associates with a cell shape-indicative marker and emits a signal recorded as the cell 
shape-indicative marker image parameter. 

37. The method of claim 36, wherein the reference cell component is nuclei, the 
25 reference cell component marker is DNA, and the first imaging agent is a DNA stain. 
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38. The method of claim 37, wherein the DNA stain is DAPI. 

39. The method of claim 36, wherein the cell shape-indicative marker is tubulin, 
and the second imaging agent is a fluorescently-labeled monoclonal antibody. 

40. The method of claim 39, wherein the fluorescently-labeled monoclonal 
antibody is rodamine-labeled Dml alpha antibody. 

41. The method of claim 1, wherein the reference cell component marker and cell 
shape-indicative marker image parameters are a light or radiation intensity, 

42. The method of claim 1 , wherein the reference cell component marker and cell 
shape-indicative marker image parameters are an electromagnetic radiation intensity 
provided at a particular wavelength or range of wavelengths. 

43. The method of claim 12, further comprising, extracting biologically- 
significant features from the one or more segmented cells. 

44. The method of claim 43, wherein said biologically-significant features 
comprise cell skeletal nodes and end points. 

45. The method of claim 44, wherein said nodes and end points are extracted 
following a skeletonization of the one or more segmented cells. 

46. The method of claim 45, wherein said skeletonization comprises conducting 
an adjustable polygon shape approximation of the cell. 

47. A method of correlating a cell's shape with a biological condition of the cell, 
comprising: 

(a) providing a plurality of segmented images of fields of one or more segmented 
cells, at least one of said fields having been treated with a biologically active agent 
and at least one of said fields being a control and having not been treated with the 
biologically active agent, the boundaries of said one or more segmented cells having 
been ascertained by the segmentation; 

for each of one or more of the cells in the plurality of segmented cell images, 
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(b) selecting two endpoints defining two parts of the boundary of said cell; 

(c) for each part of said cell boundary, computing the distance from each point on the 
part of the cell boundary to a line between said endpoints; 

(d) determining a point 6uax on the portion of the boundary, said point (Jmax being 
5 maximally distant from the line; 

(e) comparing the distance from the point dMAX to a predetermined threshold distance 
value, dm; 

(f) where dMAX is greater than d™, 

discarding the line between the endpoints, 

10 using point dMAX as a new endpoint together with one of the original endpoints to 
separate the part into two new parts, and 

repeating (c) and following; 

(g) where <1max is less than dm, using the line as a side of a polygon approximating 
the cell shape until a polygon approximating the shape of the cell is complete; 

15 (h) skeletonizing and computing at least one of end points and nodes for the polygon 
approximation of the cell; and 

(i) comparing the computations of the at least one of end points and nodes for the 
polygon approximation of the cell to identify significant shape differences between 
the treated and control fields of one or more cells. 

20 48. The method of claim 47, wherein said segmentation comprises, for each field: 

receiving a first image of a field of one or more cells in which a reference cell 
component of the one or more cells is identified by a reference cell component marker 
image parameter, 
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receiving a second image of the field of one or more cells in which at least one 
of a cell shape-indicative marker of the one or more cells is identified by a cell shape- 
indicative marker image parameter, and 

processing the first image in conjunction with the second image such that 
individual cell boundaries for the one or more cells in the field are identified. 

49. The method of claim 47, wherein said computing of at least one of end points 
and nodes for the polygon approximation of the cell comprises quantifying the at least 
one of end points and nodes for the polygon approximation of the cell 

50. The method of claim 1 , further comprising, further comprising, extracting 
biologically-significant features from the one or more identified cells. 

51. A computer program product comprising a machine readable medium on 
which is provided program instructions for identifying individual biological cells in a 
field of cells, the instructions comprising: 

code for receiving a first image of a field of one or more cells in which a 
reference cell component of the one or more cells is identified by a reference cell 
component marker image parameter; 

code for receiving a second image of the field of one or more cells in which at 
least one of a cell shape-indicative marker of the one or more cells is identified by a 
cell shape-indicative marker image parameter, and 

code for processing the first image in conjunction with the second image such 
that individual cell boundaries for the one or more cells in the field are identified. 

52. The computer program product of claim 5 1 , wherein the one or more cells are 
treated with a first imaging agent that selectively associates with the reference cell 
component marker and emits a signal recorded as the reference cell component 
marker image parameter, and said one or more cells are treated with a second imaging 
agent that selectively associates with a cell shape-indicative marker and emits a signal 
recorded as the cell shape-indicative marker image parameter. 
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53. The computer program product of claim 52, wherein the reference cell 
component is nuclei, the reference cell component marker is DNA, and the first 
imaging agent is a DNA stain. 

54. The computer program product of claim 53, wherein the DNA stain is DAPL 

55. The computer program product of claim 52, wherein the cell shape-indicative 
marker is tubulin, and the second imaging agent is a fluorescently-labeled 
monoclonal antibody. 

56. The computer program product of claim 55, wherein the fluorescently-labeled 
monoclonal antibody is rodamine-labeled Dml alpha antibody. 

57. The computer program product of claim 51, wherein the reference cell 
component marker and cell shape-indicative marker image parameters are a light or 
radiation intensity. 

58. The computer program product of claim 51, wherein the reference cell 
component marker and cell shape-indicative marker image parameters are an 
electromagnetic radiation intensity provided at a particular wavelength or range of 
wavelengths. 

59. The computer program product of claim 51, wherein said code for processing 
comprises: 

code for segmenting the one or more reference cell components in the first 
image to generate a digital representation of the first image (reference cell component 
mask); 

code for thresholding the cell shape-indicative marker in the second image to 
generate a digital representation of the second image comprising a cell shape- 
indicative marker portion and a background portion; 

code for conceptually registering the nuclei mask with the digital 
representation of the second image; and 
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code for applying a watershed algorithm to data provided by the registered 
reference cell component mask and digital representation of the second image such 
that individual cell boundaries for the one or more cells in the field are identified 

60. The computer program product of claim 59, further comprising code for 

5 extracting biologically-significant features from the one or more segmented cells. 

6 1 . The computer program product of claim 52, wherein said biologically- 
significant features comprise cell skeletal nodes and end points. 

62. The computer program product of claim 61, wherein said nodes and end points 
are extracted following a skeletonization of the one or more segmented cells. 

10 63. The computer program product of claim 62, wherein said skeletonization 

comprises conducting an adjustable polygon shape approximation of the one or more 
segmented cells. 

64. The computer program product of claim 59, wherein said code for 
thresholding of the cell shape-indicative marker comprises: 

15 code for converting the second image to a digital representation of the second 

image, 

wherein a pixel having an image parameter intensity greater than a 
threshold intensity, I™, is recognized as the cell shape-indicative marker and 
is assigned one of 0 and non-zero, and 

20 wherein a pixel having an image parameter intensity less than the 

threshold intensity, Ith, is recognized as background and is assigned the other 
of 0 and non-zero; 

wherein said threshold is calculated according to code for implementing a 
method comprising, 

25 generating a histogram of number of pixels versus image parameter 

intensity, 
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assigning an intensity of the greatest number of pixels, Imax, as 
background intensity, 

determining a standard deviation of a normal distribution of the 
background intensity, Ism md 

5 assigning a value to Ith = Imax + c * Istd- 

65. The computer program product of claim 64, wherein 0.7 > c> 0.9. 

66. The computer program product of claim 65, wherein c = 0.8. 

67. The computer program product of claim 59, wherein said application said 
watershed algorithm to the data provided by the conceptually registered reference cell* 

10 component mask and digital representation of the second image uses the original cell 
image and two different types of seeds. 

68. The computer program product of claim 67, wherein the seeds are the 
reference cell component portion of the reference component mask and the 
background portion of the digital representation of the second image. 

15 69. The computer program product of claim 63, wherein said adjustable polygon 
cell shape approximation comprises: 

(a) code for providing an image of the one or more segmented cells, the boundary of 
said one or more segmented cells having been ascertained by the segmentation; 

for each of one or more of the cells in the segmented cell image, 

20 (b) code for selecting two endpoints defining two parts of the boundary of said cell; 

(c) for each part of said cell boundary, code for computing the distance from each 
point on the part of the cell boundary to a line between said endpoints; 

(d) code for determining a point <Imax on the portion of the boundary, said point <Imax 
being maximally distant from the line; 
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(e) code for comparing the distance from the point (Imax to a predetermined threshold 
distance value, d-ra; 

(f) where (Imax is greater than dm 

code for discarding the line between the endpoints, 

5 code for using point d^AX as a new endpoint together with one of the original 
endpoints to separate the part into two new parts, and 

code for repeating (c) aia^^^^^^^^^^g^^^^^^^^ 

(g) where <Imax is less than dm, code for using the line as a side of apolygon 
approximating the cell shape until a polygon approximating the shape of the cell is 

10 complete. 

70. The computer program product of claim 69, wherein djH = c * A, where A is 
the area of the cell and c is a constant 

71. The computer program product of claim 70, wherein 1/50 > c> 1/100. 

72. The computer program product of claim 71, wherein c = 1/80. 

15 73. A computer program product comprising a machine readable medium on 

which is provided program instructions for extracting biologically-significant shape- 
related information from a field of one or more cells, the instructions comprising: 

(a) code for providing a segmented image of the field of one or more segmented cells, 
the boundaries of said one or more segmented cells having been ascertained by the 

20 segmentation; 

for each of one or more of the cells in the segmented cell image, 

(b) code for selecting two endpoints defining two parts of the boundary of said cell; 

(c) for each part of said cell boundary, code for computing the distance from each 
point on the part of the cell boundary to a line between said endpoints; 
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(d) code for determining a point <Wx on the portion of the boundary, said point d^iAx 
being maximally distant from the line; 

(e) code for comparing the distance from the point d*iAX to a predetermined threshold 
distance value, dm; 

5 (£) where dMAx is greater than dm 

code for discarding the line between the endpoints, 

code for using point dMAX as a new endpoint together with one of the original 
endpoints to separate the part into two new parts, and 

code for repeating (c) and following; 

10 (g) where (Imax is less than d-m, code for using the line as a side of a polygon 

approximating the cell shape until a polygon approximating the shape of the cell is 
complete; and 

(h) code for skeletonizing and computing at least one of end points and nodes for the 
polygon approximation of the cell. 

15 74. The computer program product of claim 73, wherein d-m - c * A, where A is 
the area of the cell and c is a constant 

75. The computer program product of claim 74, wherein 1/50 > c> 1/100. 

76. The computer program product of claim 75, wherein c = 1/80. 

77. A computer program product comprising a machine readable medium on 

20 which is provided program instructions for correlating a cell's shape with a biological 
condition of the cell, the instructions comprising: 

(a) providing a plurality of segmented images of fields of one or more segmented 
cells, at least one of said fields having been treated with a biologically active agent 
and at least one of said fields being a control and having not been treated with the 
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biologically active agent, the boundaries of said one or more segmented cells having 
been ascertained by the segmentation; 

for each of one or more of the cells in the plurality of segmented cell images, 

(b) code for selecting two endpoints defining two parts of the boundary of said cell; 

5 (c) for each part of said cell boundary, code for computing the distance from each 
point on the part of the cell boundary to a line between said endpoints; 

(d) code for determining a point dMAx on the portion of the boundary, said point 6max 
being maximally distant from the line; 

(e) code for comparing the distance from the point dMAx to a predetermined threshold* 
10 distance value, dm; 

(f) where (Imax is greater than d-ra, 

code for discarding the line between the endpoints, 

code for using point dMAx as a new endpoint together with one of the original 
endpoints to separate the part into two new parts, and 

15 code for repeating (c) and following 

(g) where <1max is less than dm code for using the line as a side of a polygon 
approximating the cell shape until a polygon approximating the shape of the cell is 
complete; 

(h) code for skeletonizing and computing at least one of end points and nodes for the 
20 polygon approximation of the cell; and 

(i) code for comparing the computations of the at least one of end points and nodes for 
the polygon approximation of the cell to identify significant differences between the 
treated and control fields of one or more cells. 

78, The computer program product of claim 77, wherein said code for 
25 segmentation comprises, for each field: 
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code for receiving a first image of a field of one or more cells in which a 
reference cell component of the one or more cells is identified by a reference cell 
component marker image parameter; 

code for receiving a second image of the field of one or more cells in which at 
least one of a cell shape-indicative marker of the one or more cells is identified by a 
cell shape-indicative marker image parameter, and 

code for processing the first image in conjunction with the second image such 
that individual cell boundaries for the one or more cells in the field are identified. 

79. The computer program product of claim 77, wherein said code for computing 
of at least one of end points and nodes for the polygon approximation of the cell 
comprises code for quantifying the at least one of end points and nodes for the 
polygon approximation of the cell. 

80. The computer program product of claim 77, wherein din = c * A, where A is 
the area of the cell and c is a constant 

8 1 . The computer program product of claim 80, wherein 1/50 > c> 1/1 00. 

82. The computer program product of claim 8 1 , wherein c = 1/80. 

83. An image analysis apparatus for identifying individual biological cells in a 
field of cells, the apparatus comprising: 

a memory or buffer adapted to store, at least temporarily, a first image of a 
field of one or more cells, in which a reference cell component of the one or more 
cells is identified by a reference cell component marker image parameter, and a 
second image of the field of one or more cells, in which at least one of a cell shape- 
indicative marker of the one or more cells are identified by a cell shape-indicative 
marker image parameter, and 

a processor configured or designed to process the first image in conjunction 
with the second image such that individual cell boundaries for the one or more cells in 
the field are identified. 
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84. The apparatus of claim 83, further comprising an interface adapted to receive 
the images of the field of one or more cells. 

85. The apparatus of claim 83, further comprising an image acquisition system 
that produces the images of the field of one or more cells. 

86. The apparatus of claim 83, wherein the one or more cells are treated with a 
first imaging agent that selectively associates with the reference cell component 
marker and emits a signal recorded as the reference cell component marker image 
parameter, and said one or more cells are treated with a second imaging agent that 
selectively associates with a cell shape-indicative marker and emits a signal recorded 
as the cell shape-indicative marker image parameter. 

87. The apparatus of claim 86, wherein the reference cell component is nuclei, the 
reference cell component marker is DNA, and the first im a gin g agent is a DNA stain. 

88. The apparatus of claim 87, wherein the DNA stain is DAPI. 

89. The apparatus of claim 86, wherein the cell shape-indicative marker is 
tubulin, and the second imaging agent is a fluorescently-labeled monoclonal 
antibody. 

90. The apparatus of claim 89, wherein the fluorescently-labeled monoclonal 
antibody is rodamine-Iabeled Dml alpha antibody. 

91. The apparatus of claim 83, wherein the reference cell component marker and 
cell shape-indicative marker image parameters are a light or radiation intensity. 

92. The apparatus of claim 83, wherein the reference cell component marker and 
cell shape-indicative marker image parameters are an electromagnetic radiation 
intensity provided at a particular wavelength or range of wavelengths. 

93 . A method of identifying boundaries of biological cells, the method 
comprising: 
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receiving a first image of a field of one or more cells in which a reference cell 
component of the one or more cells is identified by a reference cell component marker 
image parameter; 

receiving a second image of the field of one or more cells in which at least one 
5 of a cell shape-indicative marker of the one or more cells is identified by a cell shape- 
indicative marker image parameter; 

thresholding the cell shape-indicative marker image parameter in the second 
image to generate a digital representation of the second image comprising a cell 
shape-indicative marker portion and a background portion; and 

10 identifying boundaries of individual cells by applying a watershed algorithm 

to the second image using the reference cell component marker image parameter and 
the background portion of the digital representation of the second image as seeds. 

94. The method of claim 93, wherein the images of the field of one or more cells 
cell are digital representations of the field of one or more cells. 

15 95. The method of claim 94, wherein the reference cell component is a present in 
only a single copy in a cell. 

96. The method of claim 95, wherein the reference cell component is selected 
form the group consisting of nucleus, centrosome, a specific chromosome and Golgi 
complex. 

20 97. The method of claim 93, wherein the reference cell component is a previously 
segmented cell component 

98. The method of claim 97, wherein the reference cell component is a present in 
only a single copy in a cell. 

99. The method of claim 98, wherein the reference cell component is nucleus. 

25 1 00. The method of claim 93, wherein the cell shape-indicative marker is at least 
one of a cytoskeletal, a cytoplasmic, and a plasma membrane marker. 
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101. The method of claim 100, wherein the cytoskeletal marker is tubulin. 

1 02. A computer program product comprising a machine readable medium on 
which is provided program instructions for identifying individual biological cells in a 
field of cells, the instructions comprising: 

5 code for receiving a first image of a field of one or more cells in which a 

reference cell component of the one or more cells is identified by a reference cell 
component marker image parameter; 

code for receiving a second image of the field of one or more cells in which at 
least one of a cell shape-indicative marker of the one or more cells is identified by a 
10 cell shape-indicative marker image parameter, 

code for thresholding the cell shape-indicative marker image parameter in the 
second image to generate a digital representation of the second image comprising a 
cell shape-indicative marker portion and a background portion; and 

code for identifying boundaries of individual cells by applying a watershed 
15 algorithm to the second image using the reference cell component marker image 
parameter and the background portion of the digital representation of the second 
image as seeds. 
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